CACTUS: A User-friendly Toolkit for Semantic Categorization and Clustering in the Open Domain

نویسنده

  • Emily Jamison
چکیده

In this paper, we present a tool for the semantic categorization and clustering of opendomain named entities (NEs) and common nouns (CNs), the Categorization And Clustering Tool for User-defined Semantic classes, or CACTUS. The tool performs either of the two tasks, using Hearst-style (Hearst, 1992) search queries with a web search engine: assignment of NEs/CNs to user-provided semantic classes, and semantic clustering of the NEs/CNs into a user-defined number of classes. We evaluate our approach on a dataset of 400 NEs/CNs and obtain encouraging results indicating that webquery-based semantic categorization in the (nearly) untrained open domain acheives accuracy comparable to supervised, limited-domain systems.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Use of Semantic Similarity and Web Usage Mining to Alleviate the Drawbacks of User-Based Collaborative Filtering Recommender Systems

  One of the most famous methods for recommendation is user-based Collaborative Filtering (CF). This system compares active user’s items rating with historical rating records of other users to find similar users and recommending items which seems interesting to these similar users and have not been rated by the active user. As a way of computing recommendations, the ultimate goal of the user-ba...

متن کامل

Analysis of User query refinement behavior based on semantic features: user log analysis of Ganj database (IranDoc)

Background and Aim: Information systems cannot be well designed or developed without a clear understanding of needs of users, manner of their information seeking and evaluating. This research has been designed to analyze the Ganj (Iranian research institute of science and technology database) users’ query refinement behaviors via log analysis.    Methods: The method of this research is log anal...

متن کامل

Semiautomatic Image Retrieval Using the High Level Semantic Labels

Content-based image retrieval and text-based image retrieval are two fundamental approaches in the field of image retrieval. The challenges related to each of these approaches, guide the researchers to use combining approaches and semi-automatic retrieval using the user interaction in the retrieval cycle. Hence, in this paper, an image retrieval system is introduced that provided two kind of qu...

متن کامل

SemTK: An Ontology-first, Open Source Semantic Toolkit for Managing and Querying Knowledge Graphs

The relatively recent adoption of Knowledge Graphs as an enabling technology in multiple high-profile artificial intelligence and cognitive applications has led to growing interest in the Semantic Web technology stack. Many semantics-related tools, however, are focused on serving experts with a deep understanding of semantic technologies. For example, triplification of relational data is availa...

متن کامل

Centralized Clustering Method To Increase Accuracy In Ontology Matching Systems

Ontology is the main infrastructure of the Semantic Web which provides facilities for integration, searching and sharing of information on the web. Development of ontologies as the basis of semantic web and their heterogeneities have led to the existence of ontology matching. By emerging large-scale ontologies in real domain, the ontology matching systems faced with some problem like memory con...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2008